Discovering Consensus Patterns in Biological Databases
نویسندگان
چکیده
Consensus patterns, like motifs and tandem repeats, are highly conserved patterns with very few substitutions where no gaps are allowed. In this paper, we present a progressive hierarchical clustering technique for discovering consensus patterns in biological databases over a certain length range. This technique can discover consensus patterns with various requirements by applying a post-processing phase. The progressive nature of the hierarchical clustering algorithm makes it scalable and efficient. Experiments to discover motifs and tandem repeats on real biological databases show significant performance gain over non-progressive clustering techniques.
منابع مشابه
An Introduction to Policy Delphi; A tool for discovering the opposing views on health policy issues
Objective: In this review, we investigated various aspects of Policy Delphi technique to make decision-makers more aware of this pertinent method so that they can use it in their policy decisions in their organizations. Information sources and selected methods for study: This study was conducted using a review method and by searching the related literature in databases such as PubMed, Scopus a...
متن کاملIncremental Mining for Frequent Patterns in Evolving Time Series Datatabases
Several emerging applications warrant mining and discovering hidden frequent patterns in time series databases, e.g., sensor networks, environment monitoring, and inventory stock monitoring. Time series databases are characterized by two features: (1) The continuous arrival of data and (2) the time dimension. These features raise new challenges for data mining such as the need for online proces...
متن کاملDiscovering Sequence Motifs of Different Patterns Parallely using DNA Operations
Discovery of motifs in biological sequences and various types of subsequences in commercial databases have varied applications and interpretations. This paper proposes a new approach to
متن کاملMining association rules from biological databases
area such as bioinformatics. This methodology allows the identification of relationships between low-magnitude similarity (LMS) sequence patterns and other well-contrasted protein characteristics, such as those described by database annotations. We start with the identification of these signals inside protein sequences by exhaustive database searching and automatic pattern recognition strategie...
متن کاملAutomated Discovery of Protein Motifs With Genetic Programming
Automated methods of machine learning may prove to be useful in discovering biologically meaningful information hidden in the rapidly growing databases of DNA sequences and protein sequences. Genetic programming is an extension of the genetic algorithm in which a population of computer programs is bred, over a series of generations, in order to solve a problem. Genetic programming is capable of...
متن کامل